Goto

Collaborating Authors

 robot programming


AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot

Wang, Xiao, Dong, Lu, Rangasrinivasan, Sahana, Nwogu, Ifeoma, Setlur, Srirangaraj, Govindaraju, Venugopal

arXiv.org Artificial Intelligence

The social robot's open API allows users to customize open-domain interactions. However, it remains inaccessible to those without programming experience. In this work, we introduce AutoMisty, the first multi-agent collaboration framework powered by large language models (LLMs), to enable the seamless generation of executable Misty robot code from natural language instructions. AutoMisty incorporates four specialized agent modules to manage task decomposition, assignment, problem-solving, and result synthesis. Each agent incorporates a two-layer optimization mechanism, with self-reflection for iterative refinement and human-in-the-loop for better alignment with user preferences. AutoMisty ensures a transparent reasoning process, allowing users to iteratively refine tasks through natural language feedback for precise execution. To evaluate AutoMisty's effectiveness, we designed a benchmark task set spanning four levels of complexity and conducted experiments in a real Misty robot environment. Extensive evaluations demonstrate that AutoMisty not only consistently generates high-quality code but also enables precise code control, significantly outperforming direct reasoning with ChatGPT-4o and ChatGPT-o1. All code, optimized APIs, and experimental videos will be publicly released through the webpage: https://wangxiaoshawn.github.io/AutoMisty.html


Optimizing Robot Programming: Mixed Reality Gripper Control

Rettinger, Maximilian, Hacker, Leander, Wolters, Philipp, Rigoll, Gerhard

arXiv.org Artificial Intelligence

Conventional robot programming methods are complex and time-consuming for users. In recent years, alternative approaches such as mixed reality have been explored to address these challenges and optimize robot programming. While the findings of the mixed reality robot programming methods are convincing, most existing methods rely on gesture interaction for robot programming. Since controller-based interactions have proven to be more reliable, this paper examines three controller-based programming methods within a mixed reality scenario: 1) Classical Jogging, where the user positions the robot's end effector using the controller's thumbsticks, 2) Direct Control, where the controller's position and orientation directly corresponds to the end effector's, and 3) Gripper Control, where the controller is enhanced with a 3D-printed gripper attachment to grasp and release objects. A within-subjects study (n = 30) was conducted to compare these methods. The findings indicate that the Gripper Control condition outperforms the others in terms of task completion time, user experience, mental demand, and task performance, while also being the preferred method. Therefore, it demonstrates promising potential as an effective and efficient approach for future robot programming. Video available at https://youtu.be/83kWr8zUFIQ.


ImageInThat: Manipulating Images to Convey User Instructions to Robots

Mahadevan, Karthik, Lewis, Blaine, Li, Jiannan, Mutlu, Bilge, Tang, Anthony, Grossman, Tovi

arXiv.org Artificial Intelligence

--Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods--natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/. Advances in foundation models are rapidly improving the capabilities of autonomous robots, bringing us closer to robots entering our homes where they can complete everyday tasks. However, the need for human instructions will persist-- whether due to limitations in robot policies, models trained on internet-scale data that may not capture the specifics of users' environments or preferences, or simply the desire for users to maintain control over their robots' actions. For instance, a robot asked to wash dishes might follow a standard cleaning routine--e.g., by placing everything in the dishwasher and then putting them away in the cupboard--but may not respect a user's preferences-- e.g., needing to wash delicate glasses "by hand" or organizing cleaned dishes in a specific way--thus necessitating human intervention. We introduce a new paradigm for instructing robots through the direct manipulation of images. ImageInThat is a specific instantiation of this paradigm where users can manipulate images in a timeline-style interface to create instructions for the robot to execute. Existing methods for instructing robots range from those that focus on commanding the robot for the purpose of immediate execution ( e.g., uttering a language instruction to wash glasses by hand [1]) to methods that program the robot such as learning from demonstration [2] or end-user robot programming [3]. However, prior methods, whether they are used for commanding or programming, have notable drawbacks.


Cocobo: Exploring Large Language Models as the Engine for End-User Robot Programming

Ge, Yate, Dai, Yi, Shan, Run, Li, Kechun, Hu, Yuanda, Sun, Xiaohua

arXiv.org Artificial Intelligence

End-user development allows everyday users to tailor service robots or applications to their needs. One user-friendly approach is natural language programming. However, it encounters challenges such as an expansive user expression space and limited support for debugging and editing, which restrict its application in end-user programming. The emergence of large language models (LLMs) offers promising avenues for the translation and interpretation between human language instructions and the code executed by robots, but their application in end-user programming systems requires further study. We introduce Cocobo, a natural language programming system with interactive diagrams powered by LLMs. Cocobo employs LLMs to understand users' authoring intentions, generate and explain robot programs, and facilitate the conversion between executable code and flowchart representations. Our user study shows that Cocobo has a low learning curve, enabling even users with zero coding experience to customize robot programs successfully.


Immersive Robot Programming Interface for Human-Guided Automation and Randomized Path Planning

Malek, Kaveh, Danielson, Claus, Moreu, Fernando

arXiv.org Artificial Intelligence

Researchers are exploring Augmented Reality (AR) interfaces for online robot programming to streamline automation and user interaction in variable manufacturing environments. This study introduces an AR interface for online programming and data visualization that integrates the human in the randomized robot path planning, reducing the inherent randomness of the methods with human intervention. The interface uses holographic items which correspond to physical elements to interact with a redundant manipulator. Utilizing Rapidly Random Tree Star (RRT*) and Spherical Linear Interpolation (SLERP) algorithms, the interface achieves end-effector s progression through collision-free path with smooth rotation. Next, Sequential Quadratic Programming (SQP) achieve robot s configurations for this progression. The platform executes the RRT* algorithm in a loop, with each iteration independently exploring the shortest path through random sampling, leading to variations in the optimized paths produced. These paths are then demonstrated to AR users, who select the most appropriate path based on the environmental context and their intuition. The accuracy and effectiveness of the interface are validated through its implementation and testing with a seven Degree-OF-Freedom (DOF) manipulator, indicating its potential to advance current practices in robot programming. The validation of this paper include two implementations demonstrating the value of human-in-the-loop and context awareness in robotics.


Offline robot programming assisted by task demonstration: an AutomationML interoperable solution for glass adhesive application and welding

Babcinschi, M., Cruz, F., Duarte, N., Santos, S., Alves, S., Neto, P.

arXiv.org Artificial Intelligence

Robots have been successfully deployed in both traditional and novel manufacturing processes. However, they are still difficult to program by non-experts, which limits their accessibility to a wider range of potential users. Programming robots requires expertise in both robotics and the specific manufacturing process in which they are applied. Robot programs created offline often lack parameters that represent relevant manufacturing skills when executing a specific task. These skills encompass aspects like robot orientation and velocity. This paper introduces an intuitive robot programming system designed to capture manufacturing skills from task demonstrations performed by skilled workers. Demonstration data, including orientations and velocities of the working paths, are acquired using a magnetic tracking system fixed to the tools used by the worker. Positional data are extracted from CAD/CAM. Robot path poses are transformed into Cartesian space and validated in simulation, subsequently leading to the generation of robot programs. PathML, an AutomationML-based syntax, integrates robot and manufacturing data across the heterogeneous elements and stages of the manufacturing systems considered. Experiments conducted on the glass adhesive application and welding processes showcased the intuitive nature of the system, with path errors falling within the functional tolerance range.


BANSAI: Towards Bridging the AI Adoption Gap in Industrial Robotics with Neurosymbolic Programming

Alt, Benjamin, Dvorak, Julia, Katic, Darko, Jäkel, Rainer, Beetz, Michael, Lanza, Gisela

arXiv.org Artificial Intelligence

Deep neural networks and subsymbolic learning have progressed In this paper, we propose that neurosymbolic programming tremendously over the past decade, producing increasingly - a principled combination of symbolic AI and deep learning promising results in the domain of program synthesis and (DL) for program representation, synthesis and optimization robot control [1]. While the use of robots in the manufacturing - can overcome this gap. We describe BANSAI (Bridging industries is ubiquitous, the current degree of industry adoption the AI Adoption Gap via Neurosymbolic AI), an approach for of artificial intelligence-based robot program synthesis and optimization the application of neurosymbolic programming to industrial remains very limited, particularly with regard to deep robotics. To that end, we contribute an analysis of the AI adoption learning (DL) [2]. This reflects a broader phenomenon in the gap, highlighting a mismatch between the requirements manufacturing industry, where artificial intelligence (AI) adoption imposed by the industrial robot programming and deployment lags behind the academic state of the art, with a "lack of process and the exigencies of state-of-the-art AI-based manipulation, substantial evidence of industrial success" at technology readiness program synthesis and optimization approaches.


Forgetful Large Language Models: Lessons Learned from Using LLMs in Robot Programming

Chen, Juo-Tung, Huang, Chien-Ming

arXiv.org Artificial Intelligence

Large language models offer new ways of empowering people to program robot applications-namely, code generation via prompting. However, the code generated by LLMs is susceptible to errors. This work reports a preliminary exploration that empirically characterizes common errors produced by LLMs in robot programming. We categorize these errors into two phases: interpretation and execution. In this work, we focus on errors in execution and observe that they are caused by LLMs being "forgetful" of key information provided in user prompts. Based on this observation, we propose prompt engineering tactics designed to reduce errors in execution. We then demonstrate the effectiveness of these tactics with three language models: ChatGPT, Bard, and LLaMA-2. Finally, we discuss lessons learned from using LLMs in robot programming and call for the benchmarking of LLM-powered end-user development of robot applications.


Inside-out Infrared Marker Tracking via Head Mounted Displays for Smart Robot Programming

Puljiz, David, Vasilache, Alexandru-George, Mende, Michael, Hein, Björn

arXiv.org Artificial Intelligence

Intuitive robot programming through use of tracked smart input devices relies on fixed, external tracking systems, most often employing infra-red markers. Such an approach is frequently combined with projector-based augmented reality for better visualisation and interface. The combined system, although providing an intuitive programming platform with short cycle times even for inexperienced users, is immobile, expensive and requires extensive calibration. When faced with a changing environment and large number of robots it becomes sorely impractical. Here we present our work on infra-red marker tracking using the Microsoft HoloLens head-mounted display. The HoloLens can map the environment, register the robot on-line, and track smart devices equipped with infra-red markers in the robot coordinate system. We envision our work to provide the basis to transfer many of the paradigms developed over the years for systems requiring a projector and a tracked input device into a highly-portable system that does not require any calibration or special set-up. We test the quality of the marker-tracking in an industrial robot cell and compare our tracking with a ground truth obtained via an ART-3 tracking system.


Hiding task-oriented programming complexity: an industrial case study

Villagrossi, Enrico, Delledonne, Michele, Faroni, Marco, Beschi, Manuel, Pedrocchi, Nicola

arXiv.org Artificial Intelligence

The ease of use of robot programming interfaces represents a barrier to robot adoption in several manufacturing sectors because of the need for more expertise from the end-users. Current robot programming methods are mostly the past heritage, with robot programmers reluctant to adopt new programming paradigms. This work aims to evaluate the impact on non-expert users of introducing a new task-oriented programming interface that hides the complexity of a programming framework based on ROS. The paper compares the programming performance of such an interface with a classic robot-oriented programming method based on a state-of-the-art robot teach pendant. An experimental campaign involved 22 non-expert users working on the programming of two industrial tasks. Task-oriented and robot-oriented programming showed comparable learning time, programming time and the number of questions raised during the programming phases, highlighting the possibility of a smooth introduction to task-oriented programming even to non-expert users.